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We used a cost-effective, non-invasive method to obtain high-quality DNA from buccal epithelial-cells 
(BEC) of premature infants for genomic analysis. DNAs from BEG were obtained from premature 
infants with gestational age < 36 weeks. Short terminal repeats (STRs) were performed simultaneously 
on DNA obtained from the buccal swabs and blood from the same patient. The STR profiles 
demonstrated that the samples originated from the same individual and exclude any contamination by 
external DNAs. Whole exome sequencing was performed on DNAs obtained from BEC on premature 
infants with and without necrotizing enterocolitis, and successfully provided a total number of reads 
and variants corroborating with those obtained from healthy blood donors. We provide a proof of 
concept that BEC is a reliable and preferable source of DNA for high-throughput sequencing in 
premature infants. 



Understanding the genetic basis of a disease has vast potential benefit to healthcare. Obtaining genetic 
material for analysis is thus essential and has broad implications for understanding the pathogenesis of 
disease and for potentially designing individualized therapies. To this end, building repositories of genetic 
material may prove to be a useful tool. Several molecular genetic tests can be performed using dried blood spots, as 
is the case with statewide newborn screens. Other, more extensive testing, such as chromosome analysis, FISH 
(fluorescent in situ hybridization), microarray and PCR-based genotyping assays require whole blood samples. 
However, blood sampling is invasive, expensive and with limitations in preterm neonates. For these infants, every 
milliliter of blood is significant, and relatively small volumes can constitute a large percentage of total blood 
volume. Additionally, obtaining blood for laboratory analysis may cause pain or discomfort and should only be 
collected when absolutely necessary. 

The use of innovative and minimally invasive practices in pediatric and neonatal populations remains import- 
ant. Buccal cells have previously been discredited as a source of reliable DNA in neonates, due to maternal 
epithelial cell contamination' ''. The purpose of this study is to evaluate the efficacy of already tried- and- tested 
buccal swab method to obtain high-quality DNA for high-throughput genomic analysis. This analysis includes: 
short tandem repeat (STR) analysis, Taqman Allelic Discrimination Assay, Single Nucleotide Polymorphisms 
(SNPs) genotyping by PCR-RFLP, and more importantly whole exome sequencing (WES). 

Results 

Genomic DNA was successfully isolated from all samples (170 buccal brushes from 85 patients and 61 whole 
blood samples). Thirty-five (41%) premature neonates were extremely low birth weight, 33 (39%) were very low 
birth weight and 14 neonates (16.5%) were considered as low birth weight (Table 1). High quality DNA was 
obtained from buccal epithelial cells (BEG) with an average concentration 255.22 ng/|j,L (range: 89.5 to 421 ng/|il) 
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Table 1 | Patient Demographic and Gestational Age and Bir+h 
Weight in premature infants 

Gestational age (weeks) Birth Weight (kg) Patients (N = 85) 



24 +/- 2 
28 +/- 2 
32 +/- 2 
34-36 



0.5-0.9 
0.6-1.7 
1.0-1.8 
2.6-2.8 



35 
33 
14 
3 



DNAfrom the buccal swabs was obtained from thirty five (41 %) premature infants with extremely 
low birth weigh [<1 kg), 33 (39%) with very low birth weight and 1 4 [1 6.5%) with low [low birth 
weight [<2 kg). 



and from whole blood (WB) (34.43 ng/|il; range 5.5 to 182.8 ng/^l). 
Interestingly, the DNA yield from BEC, per set of experiments, was 
significantly higher than WB (p < 0.0001). 

To confirm that the DNAs obtained from BEC are free of any 
external DNA contamination, we performed the STR (Short 
Terminal Repeat) on 12 DNA pairs (12 BEC and 12 WB) using 
AmpFlSTR® Plus and the results were analyzed by GeneMarker 2.4 
(Softgenetics, PA). Full, single source profiles were obtained from all 
samples and the profile of each BEC sample matched at all 15 loci and 
Amelogenein with the WB sample from the same individual (Figure 1). 
These results confirmed that the DNA obtained from the buccal swabs 
was not contaminated by any external DNAs (Supplementary Table SI 
online). Concomitantly, the same 12 DNA pairs were tested using six 



TaqMan Probe-based Allelic discrimination assays for detection of 
single nucleotide polymorphisms (SNPs). Data fi^om genetic profiles 
obtained from BEC corroborate 100% with those obtained from WB 
cells (Supplementary Table S2 online). We then used these DNAs to 
amplify a 485 bp sequence in the regulatory region of the TRIM21 gene 
containing the polymorphic Bgl II site (C/T). PCR-RFLP reactions 
were successfully performed for all DNA from BEC samples 
(Supplementary Figure SI online). 

Whole exome sequencing is the state-of-art means of genomic 
analysis. In an effort to evaluate whether the quality of the buccal 
epithelial cell DNA in healthy and pathological cases were adequate 
for next generation DNA sequencing technologies, we performed 
whole exome sequencing on four samples: two healthy premature 
infants and two infants with necrotizing enterocolitis (Bell's Stage 
III). The total number of reads for the controls #1, #2 and patients #1 
and #2 were respectively 18,448,882, 24,206,718, 16,874,844 and 
31,507,076. The average coverage was evaluated at 17.1 X. The total 
number of coding variants discovered that passed analysis para- 
meters was 18,649 ± 1,781 (Table 2). Our data corroborate with 
laboratory results obtained from the whole blood of healthy donors 
(unpublished data) and other previously published studies^'*" 

Our data provides the proof of concept that an already tried-and- 
tested buccal swab method is a reliable, inexpensive, non-invasive 
and suitable for biobanking of genomic materials. The DNA from 
BEC meets quantitative and qualitative requirements for high- 
throughput screening and next generation sequencing technologies. 
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Figure 1 | Electropherogram of four STR loci. The electropherogram of STRs were obtained from the amplification of WB (A) and BEC (B) samples. 
Across the profile two or less alleles are present at each locus and peak height ratio, between sister alleles at heterozygous loci, is within the expected rage 
indicating that both are single source samples (i.e. absence of contamination). The two profiles are an exact match demonstrating that the samples 
originated from the same individual. 
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Table 2 | Whole Exome Sequencing (WES) 

Control #1 Control #2 Patient #1 Patient #2 Healthy Donor* 



Total Variants 












Missense 


8,541 


8,411 


8,417 


7,007 


7,927 


Nonsense 


72 


60 


71 


61 


59 


Frameshift Insertion/Deletion 


78 


86 


82 


63 


81 


Synonymous 


10,738 


10,340 


1 0,764 


8,598 


9,444 


Splicing 


157 


158 


173 


134 


195 


Non-frameshift Insertion/ 


145 


133 


166 


139 


169 


Deletion 












Total Variants 


19,731 


19,188 


1 9,673 


16,002 


17,875 



DNA from BEC were used to perform WES. DNA from either healthy premature infants [Controls) or premature infants with necrotizing enterocolitis (Patients) showed comparable results indicating the 
adequacyof the quality and quantity of DNA obtained from buccal swabs for genomic analysis. * WES data from the BEC are comparable with those obtained from healthy blood donor (unpublished data). 



Discussion 

Our cohort of eighty-five premature infants is larger than any prev- 
iously published studies on the use of BEC for DNA extraction in this 
population and is the only one focused exclusively on premature 
infants' 

Although whole blood samples provide generous amounts of good 
quality DNA, its collection remains invasive, expensive and technical 
difficulties associated with phlebotomy in small, sick preterm neo- 
nates often limit the volume of blood obtained and therefore, reduce 
the possibility of genomic testing. Phlebotomy from a neonate 
requires a skilled practitioner and the use of a large number of 
DNA purification columns (7 to 8 purification columns for 750 |il 
of blood), which significantly increases the cost and the extraction 
time. In addition, blood drawing, placement of peripheral and central 
vascular catheters, can cause pain and discomfort, compromise the 
skin integrity and increase the risk of infection in premature neo- 
nates. Alternatively, BEC can be collected by any trained member of a 
clinical care or research team and does not require the use of a 
particular extraction kit, reducing the overall cost and does not 
increase the odds of the infection associated with venipuncture. 
Research dedicated to advancing the care of premature neonates 
has necessitated investigation into reliable sources of genomic 
DNA. This study successfully validated the use of BEC as a non- 
invasive and reliable source of genomic DNA for use in a variety of 
genetic assays. 

The issue of possible contamination always remains paramount. 
STR is a reliable method to determine whether or not any contam- 
ination with external DNA exists. To confirm the BEC and blood 
samples were from the sample individual, we utilized STR analysis of 
matched BEC and blood samples from 12 patients. Historically, the 
STR analysis has been used to ensure that a prenatal fetal sample is 
not contaminated with maternal cells prior to assaying the prenatal 
fetal sample". Therefore, it constitutes a very sensitive method able 
to detect the DNA from single cell. All 15 STR loci and amelognein 
showed similar profile between the matched BEC and blood samples. 
This demonstrates there was not any contamination of the BEC and 
blood samples by maternal or any other DNA. 

While the use of BEC for genomic DNA is not a novel method, we 
successfully showed that the improved methodology can be used for 
genetic analysis and the state-of-art genomic technology such as 
whole exome sequencing. 

The collection of DNA from BEC provides high quality and quant- 
ity DNA for genomic studies. Furthermore, it will allow for easy re- 
sampling in premature or newborn infants if an assay fails. 

Methods and Patients 

Patient selection. Following Institutional Review Board approval protocol at 
Children's National Health system, parental consent were obtained from aU patients 
with a gestational age of less than 36 weeks included in this study. Preterm infants 
were recruited at the Neonatal Intensive Care Unit (NICU) at Children's National 
Health System, a 54 bed, level IV NICU. All enrolled patients were Nil Per Os (NPO) 



at the time of buccal swab and blood collection. Patient demographics are represented 
in Table 1. 

Sample collection and DNA extraction. Buccal swabs were collected from 85 
patients with a gestational age ranging from 24-36 weeks using cytology brushes. 
Briefly, 2 brushes were twirled on the inside of each cheek for less than 10 seconds. 
0.75 mL of blood was also collected from 61 patients. DNA was extracted from blood 
or buccal swab specimens using Qiagen Buccal Cell and DNAeasy Kit (Qiagen 
Sciences, MD) respectively. We modified the buccal cell extraction kit for each 
experiment set to accommodate a three-fold increase in sample processing for a cell 
lysate volume of 900 |iL. 100 |iL of whole blood in EDTA was used for DNA 
extraction, corresponding to the recommended upper volume limit by the Qiagen 
DNA extraction kit for one set of experiments. 

Short tandem repeat (STR) analysis. DNAs were diluted to a fmal concentration of 
0.5 ng/fiL. One jiL of each samples was amplified with AmpFlSTR® Identifiler® Plus 
(Applied Biosystems) with 2 jiL of reaction mix, 1 [iL of primer mix, 1 |iL dH20 in a 
5 fiL final volume. The amplification cycle was 1 1 min. at 95"C, 28 X (20 sec. at 94°C, 
3 min. at 59 'C), 10 min at 60^C, ^ at 4^C. To prepare samples for electrophoresis, 
10 |iL of LIZ 120 size standard was added to 400 mL of Hi-Di formamide (Applied 
Biosystems), and 1 mL of sample was added to 10 mL of the Formamide/ILS mixture. 
The AmpFlSTR® Identifiler® Plus kit contains 15 STR systems (D8S1179, D21S11, 
D7S820, CSFIPO, D3S1358, THOl, D13S317, D16S539, D2S1338, D19S433, VWA, 
TPOX, D18S51, D5S818 and FGA). Samples were electrophoresed on the 3130 
Genetic Analyzer (Applied Biosystems), using a 36 cm capillary and POP-7 polymer 
with injection parameters of 1.2 kVfor 16 s. STR fragment analysis was GeneMarker 
2.4 (Softgenetics, PA). 

Taqman Probe-based assay. 10 ng of DNA obtained from the buccal swabs and 

whole blood were analyzed for six SNPs (rsl799983 [C 3219460_20], rs854560 

[C 2259750_20],rsll37101 [C 8722581_10], rsl815739 [C 590093_1J, 

rsl046502 [C 7577769_10], and rs4871385 [C_12060045_20) according to the 

manufacturer's protocol (Life Technologies, CA). Briefly, this method employs the 5' 
nuclease activity of Taq polymerase to detect a fluorescent reporter signal generated 
during PGR reactions. Data were collected on a Life Technologies 7900HT Sequence 
Detection System and analyzed using the SDS 2.4 software. 

PCR-restriction fragment length polymorphism (PGR- RFLP) analysis. 200 ng of 

BEG DNA from 85 premature infants were subject to PGR using 5' CTG TAG ATC 
GAGAGTGAGG3' (Forward primer) and 5 ' GAT CGC TTG TC A GAT GG A TAG 
3' (Reverse primer). The PGR products were then digested with the restriction 
enzyme Bgl II (New England BioLabs, MA) to determine a polymorphism in the 
TRIM21 gene according to previously published data'"'-^^. 

Genomic DNA quantification and quality assessment. Quality of genomic DNA 
was assessed by 1% agarose gel. Samples that pass the gel check proceed to 
quantification using Qubit 2.0 Fluorometer using Qubit® dsDNA BR Assay Kits 
(Invitrogen, GA). 

Illumina DNA library preparation. DNA library preparation was completed using 
lUumina's TruSeq DNA Sample Prep v2 kit protocol (Illumina, CA). The DNA (1 [ig) 
was randomly fragmented by the Govaris S220 (Covaris, MA) using insert sizes of 100 
to 900 bp. DNA quality was checked by analysis of samples on the Agilent 2100 
Bioanalyzer using a DNA High Sensitivity chip followed by quantification on the 
Qubit 2.0 Fluorometer (Life Technologies). 

Exome enrichment was carried out using the standard protocol for the Illumina 
TruSeq Exome Enrichment Kit (Illumina, CA). Combining 500 ng from each sample 
creates library pools (only samples with different index adapters were pooled 
together). Following the lUumina Trueseq Exome Enrichment protocol, the quality of 
the final libraries is checked using an Agilent High Sensitivity DNA Bioanalyzer chip 
(Agilent, CA). The libraries are then quantified using the Qubit 2.0 Fluorometer and 
normalized to 1 ng/ul. 
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Qualitative PGR (qPCR). The Kapa Biosystems Library Quantification Kit-Illumina/ 
ABI Prism kit is used for the qPCR (Kapa Biosystems, MA). The quality of the fmal 
libraries are checked using an Agilent High Sensitivity DNA Bioanalyzer chip 
(Agilent, CA). The libraries are quantified using the Qubit 2.0 Fluorometer and the 
libraries are normalized to 1 ng/ul. The qPCR is performed on the normalized library 
with a Life Technologies 7900HT Real Time PGR System to determine the 
concentration. All of the libraries are pooled together and normalized to single 
concentration (4 nM). For the HiscanSQ analysis seven pools were created has and 
each pool has four samples for a total of 28 samples. qPCR is performed on the library 
pools using the ABI 7900HT Fast Real Time PGR System to validate the final 
concentration. Thermal cycling parameters were as follows: with the following 
conditions: 95''C for 5 minutes followed by 35 cycles of 95''G for 30 seconds, 60" G for 
30 seconds. 

Illumina cluster generation and sequencing. The lUumina cBot was used to 
hybridize the libraries to the flowcell and generate clusters. The flowcell was then 
loaded onto the Illumina HiScan along with the TruSeq SBS v3 200 cycle kit (Illumina, 
GA) and ran on a 101 X 7 X 101 paired end single multiplexed program. The exome 
sequencing analysis took approximately 10 days to finish. 

Genome analysis tool kit for exome analysis. We sequenced four samples on one 
lane on an lUumina HiScanSQ system, aligned the resulting reads to the hgl9 
reference genome with Burrows-Wheeler alignment (BWA)^^, applied the genome 
analysis tool kit (GATK)'^ base quality score recalibration, insertion/deletion 
realignment, duplicate removal, and performed SNP and insertion-deletion^^ 
discovery and genotyping across all four samples simultaneously using standard hard 
filtering parameters or variant quality score recalibration'^. 

Statistical analysis. A student's two-tailed t-test was used to compare yields between 
whole blood and buccal cells. 
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